Learning with Low-Quality Data: Multi-View Semi-Supervised Learning with Missing Views

نویسنده

  • Brian Quanz
چکیده

The focus of this thesis is on learning approaches for what we call “low-quality data” and in particular data in which only small amounts of labeled target data is available. The first part provides background discussion on low-quality data issues, followed by preliminary study in this area. The remainder of the thesis focuses on a particular scenario: multi-view semi-supervised learning. Multi-view learning generally refers to the case of learning with data that has multiple natural views, or sets of features, associated with it. Multi-view semi-supervised learning methods try to exploit the combination of multiple views along with large amounts of unlabeled data in order to learn better predictive functions when limited labeled data is available. However, lack of complete view data limits the applicability of multi-view semisupervised learning to real world data. Commonly, one data view is readily and cheaply available, but additionally views may be costly or only available in some cases. This thesis work aims to make multi-view semi-supervised learning approaches more applicable to real world data specifically by addressing the issue of missing views through both feature generation and active learning, and addressing the issue of model selection for semi-supervised learning with limited labeled data. This thesis introduces a unified approach for handling missing view data in multi-view semi-supervised learning tasks, which applies to both data with completely missing additional views and data only missing views in some instances. The idea is to learn a feature generation function mapping one view to another with the mapping biased to

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization

We address the problem of learning classifiers when observations have multiple views, some of which may not be observed for all examples. We assume the existence of view generating functions which may complete the missing views in an approximate way. This situation corresponds for example to learning text classifiers from multilingual collections where documents are not available in all languag...

متن کامل

Active + Semi-supervised Learning = Robust Multi-View Learning

In a multi-view problem, the features of the domain can be partitioned into disjoint subsets (views) that are sufficient to learn the target concept. Semi-supervised, multi-view algorithms, which reduce the amount of labeled data required for learning, rely on the assumptions that the views are compatible and uncorrelated (i.e., every example is identically labeled by the target concepts in eac...

متن کامل

A Reconstruction Error Formulation for Semi-Supervised Multi-task and Multi-view Learning

A significant challenge to make learning techniques more suitable for general purpose use is to move beyond i) complete supervision, ii) low dimensional data, iii) a single task and single view per instance. Solving these challenges allows working with “Big Data” problems that are typically high dimensional with multiple (but possibly incomplete) labelings and views. While other work has addres...

متن کامل

Missing Value Imputation Using a Semi-supervised Rank Aggregation Approach

One relevant problem in data quality is the presence of missing data. In cases where missing data are abundant, effective ways to deal with these absences could improve the performance of machine learning algorithms. Missing data can be treated using imputation. Imputation methods replace the missing data by values estimated from the available data. This paper presents Corai, an imputation algo...

متن کامل

Multi-view Semi-supervised Learning: An Approach to Obtain Different Views from Text Datasets

The supervised machine learning approach usually requires a large number of labelled examples to learn accurately. However, labelling can be a costly and time consuming process, especially when manually performed. In contrast, unlabelled examples are usually inexpensive and easy to obtain. This is the case for text classification tasks involving on-line data sources, such as web pages, email an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012